Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

نویسندگان

  • István Szita
  • András Lörincz
چکیده

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the fixed point of approximate value iteration (AVI); (ii) the number of steps when the agent makes non-near-optimal decisions (with respect to the solution of AVI) is polynomial in all relevant quantities; (iii) the per-step costs of the algorithm are also polynomial. To our best knowledge, FOIM is the first algorithm with these properties. This extended version contains the rigorous proofs of the main theorem. A version of this paper appeared in ICML’09.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem s...

متن کامل

Efficient Solution Algorithms for Factored MDPs

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MD...

متن کامل

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...

متن کامل

Equivalence notions and model minimization in Markov decision processes

Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for solving them run in time polynomial in the size of the state space, where this size is extremely large for most real-world planning problems of interest. Recent AI research has addressed this problem by representing the ...

متن کامل

Efficient Reinforcement Learning in Factored MDPs

We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0904.3352  شماره 

صفحات  -

تاریخ انتشار 2009